Implement the migration using a batched, stream-like approach: read users in small chunks from SQL, transform each batch, and insert using unordered bulkWrite with controlled batch sizes to maximize throughput while keeping memory constant
Migrating 1 million records efficiently requires moving away from loading all data into memory at once. The core strategy is to process records in small, fixed-size batches. This keeps memory usage constant regardless of total data volume. You'll combine this with MongoDB's bulkWrite() for efficient insertion and carefully manage backpressure between reading from SQL and writing to MongoDB. The key insight from production experience is that fetching all records with toArray() on either side will crash your process with an out-of-memory error. Instead, you must treat the migration as a controlled data pipeline.
The first critical rule: never SELECT * FROM table without limits. That would load 1 million rows into your Node.js memory at once, likely exceeding the default 1.5GB V8 heap limit. Instead, use SQL pagination with LIMIT and OFFSET, or for better performance with large offsets, use keyset pagination (e.g., WHERE id > lastId ORDER BY id). This ensures you only hold one batch of records (e.g., 1,000 rows) in memory at a time. After processing each batch, allow the garbage collector to reclaim memory by letting the batch variable go out of scope.
Individual insertOne() calls for each record would be prohibitively slow due to network round-trip overhead. Instead, use MongoDB's bulkWrite() to send multiple operations in a single network call. The driver automatically splits large batches to stay within the 16MB BSON limit and the server's maxWriteBatchSize (100,000 operations). Set ordered: false to allow MongoDB to execute operations in parallel and continue even if individual inserts fail, which maximizes throughput for independent records. You must also configure a batchSize for your bulk writes—1,000 to 5,000 operations per batch is a proven range that balances memory usage and performance.
The orchestrator connects the batched SQL reader and the batched MongoDB writer. It runs a loop: read a batch from SQL, transform the records (field mapping, data type conversion), write the batch to MongoDB, and track progress. Critically, you should introduce a small delay between batches (e.g., setTimeout(10)) to avoid overwhelming the target database and to allow pending write operations to settle. This also prevents driver-level connection pool exhaustion. The loop continues until SQL has no more records.
Batch size tuning: Start with 1,000 documents per batch. Monitor MongoDB server CPU and disk I/O. Increase to 5,000 if the server can handle it, but watch for writeErrors indicating timeouts.
Unordered bulk writes: Always set ordered: false. This allows parallel execution and continues despite individual document failures, dramatically improving throughput.
Index strategy: Temporarily drop non-essential indexes before migration and rebuild afterward. Each index adds write overhead during inserts. Keep only the _id index during migration.
Connection pooling: Configure both MySQL and MongoDB pools with appropriate limits (e.g., connectionLimit: 10 for MySQL, maxPoolSize: 10 for MongoDB) to avoid exhausting database connections.
Error handling and resume: Store the last processed ID after each batch. If the script crashes, you can resume from that ID rather than restarting from zero. Use a simple text file or a tracking collection in MongoDB.
Node.js runs with a default heap limit of approximately 1.5GB for 64-bit systems. By processing in batches, you stay far below this limit. However, you can help the garbage collector by explicitly dereferencing large objects after use (e.g., rows = null; documents = null;). If you encounter memory pressure, consider using global.gc() if you run Node.js with --expose-gc, though this is rarely needed with proper batching. Monitor memory usage with process.memoryUsage().heapUsed logged periodically to verify flat memory consumption.
Production migrations often involve additional complexity: data transformations (field renaming, type coercion, denormalization), handling duplicate keys with upserts, and dealing with referential integrity. The batched pattern extends to all these cases. For transformations, apply them per batch before the bulk write. For upserts, change the operation to updateOne with upsert: true and a filter that matches existing documents (e.g., on a migrated legacyId field). The batching logic remains identical.